428 research outputs found
Understanding Data Manipulation and How to Leverage it To Improve Generalization
Augmentations and other transformations of data, either in the input or latent space, are a critical component of modern machine learning systems. While these techniques are widely used in practice and known to provide improved generalization in many cases, it is still unclear how data manipulation impacts learning and generalization. To take a step toward addressing the problem, this thesis focuses on understanding and leveraging data augmentation and alignment for improving machine learning performance and transfer. In the first part of the thesis, we establish a novel theoretical framework to understand how data augmentation (DA) impacts learning in linear regression and classification tasks. The results demonstrate how the augmented transformed data spectrum plays a key role in characterizing the behavior of different augmentation strategies, especially in the overparameterized regime. The tools developed in this aim provide simple guidelines to build new augmentation strategies and a simple framework for comparing the generalization of different types of DA. In the second part of the thesis, we demonstrate how latent data alignment can be used to tackle the domain transfer problem, where training and testing datasets vary in distribution. Our algorithm builds upon joint clustering and data-matching through optimal transport, and outperforms the pure matching algorithm baselines in both synthetic and real datasets. Extension of the generalization analysis and algorithm design for data augmentation and alignment for nonlinear models such as artificial neural networks and random feature models are discussed. This thesis provides tools and analyses for better data manipulation design, which benefit both supervised and unsupervised learning schemes.Ph.D
Little String Amplitudes (and the Unreasonable Effectiveness of 6D SYM)
We study tree level scattering amplitudes of four massless states in the
double scaled little string theory, and compare them to perturbative loop
amplitudes in six-dimensional super-Yang-Mills theory. The little string
amplitudes are computed from correlators in the cigar coset CFT and in N=2
minimal models. The results are expressed in terms of integrals of conformal
blocks and evaluated numerically in the alpha' expansion. We find striking
agreements with up to 2-loop scattering amplitudes of massless gluons in 6D
SU(k) SYM at a Z_k invariant point on the Coulomb branch. We comment on the
issue of UV divergence at higher loop orders in the gauge theory and discuss
the implication of our results.Comment: 58 pages, 5 figures, 3 tables, comments added, references adde
Topological Defect Lines and Renormalization Group Flows in Two Dimensions
We consider topological defect lines (TDLs) in two-dimensional conformal
field theories. Generalizing and encompassing both global symmetries and
Verlinde lines, TDLs together with their attached defect operators provide
models of fusion categories without braiding. We study the crossing relations
of TDLs, discuss their relation to the 't Hooft anomaly, and use them to
constrain renormalization group flows to either conformal critical points or
topological quantum field theories (TQFTs). We show that if certain
non-invertible TDLs are preserved along a RG flow, then the vacuum cannot be a
non-degenerate gapped state. For various massive flows, we determine the
infrared TQFTs completely from the consideration of TDLs together with modular
invariance.Comment: 101 pages, 63 figures, 2 tables; v3: minor changes, added footnotes
and references, published versio
- …